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1.    Introduction. 

In  recent  years  much  research  has  focused  on  the  problem  of  finding  efficient  precondi- 
tioners to  use  with  various  iterative  methods  for  solving  linear  systems.  Examples  of  precon- 
ditioners, or  of  iterative  methods  that  can  be  viewed  as  using  special  preconditioners,  include 
the  incomplete  Cholesky  factorization  [19],  the  SSOR  preconditioner  [25],  multigrid  methods 
[2],  domain  decomposition  techniques  [1],  hierarchical  basis  functions  [26],  and  many,  many 
more. 

An  efficient  preconditioner  M  for  a  matrix  A  must  possess  two  properties: 

1.)  Linear  systems  with  coefficient  matrix  M  must  be  relatively  easy  to  solve;  and 

2.)  The  matrix  M  must  "approximate"  the  matrix  A. 

Many  of  the  preconditioners  that  have  been  proposed  are  easy  to  solve  because  of  their  spar- 
sity pattern  or  because  they  are  products  of  known  lower  and  upper  triangular  matrices  with 
simple  sparsity  patterns. 

The  sense  in  which  M  should  "approximate"  A  differs  according  to  the  iterative  method 
to  be  used.  For  simple  iterative  refinement  methods  (x*  =  x*_1  +  M~l(b-Axk~1)),  the 
asymptotic  convergence  rate  is  determined  by  the  spectral  radius 

p(/  -  Af-'A). 

For  fast  asymptotic  convergence,  this  quantity  should  be  small. 

When  the  matrices  A  and  M  are  symmetric  and  positive  definite,  this  basic  iterative 
method  can  be  accelerated  through  use  of  the  Chebyshev  or  conjugate  gradient  iteration.  For 
the  Chebyshev  or  conjugate  gradient  methods,  the  ratio 


k(M-'A) 


>W(M-'A) 
Xmi„(Af-'A) 
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-  2  - 
of  the  largest  to  the  smallest  eigenvalue  of  A/~'A  enters  into  an  upper  bound  for  the  error 

v7+T 


* 


To  make  this  bound  small,  k  should  be  close  to  1.  This  bound  is  sharp  for  the  Chebyshev 
method,  in  the  sense  that  there  is  an  initial  guess  for  which  the  bound  will  be  attained  at 
every  step.  It  is  not  sharp  for  the  conjugate  gradient  method.  A  sharp  error  bound  for  the 
conjugate  gradient  method  is  more  complicated  [13],  involving  the  distribution  of  all  eigen- 
values of  M"'A,  but  a  condition  number1  k(A/~'A)  close  to  1  is  sufficient  to  ensure  fast  con- 
vergence of  this  algorithm  as  well,  even  when  the  effects  of  finite  precision  arithmetic  are 
taken  into  account  [14].  Therefore,  we  will  define  "optimality"  in  terms  of  the  condition 
number  k(M ~'A)  and  minimization  of  this  quantity  will  be  our  goal. 

It  is  well-known  that  the  largest  eigenvalue  of  a  symmetric  matrix  S  is  a  convex  function 
of  the  elements  of  S.  Although  the  function  is  not  differentiable  at  points  where  eigenvalues 
coalesce,  (and  one  can  usually  expect  the  minimum  to  occur  at  such  a  point),  the  problem  of 
minimizing  this  function  can  be  handled  numerically  using  optimization  techniques.  If  the  ele- 
ments of  S  are  affine  functions  of  a  vector  x  of  unknowns 


S(x)  =  S0  +    2  Sk'k 

k=\ 

then  the  largest  eigenvalue  of  S(x)  will  be  a  convex  function  of  x.  Given  the  matrices 
Sk,  k  =  0,l,...,m  an  algorithm  due  to  M.  Overton  [20]  can  be  used  to  find  the  vector  x  for 
which  the  spectral  radius  of  S(x)  is  minimal.  The  algorithm  is  asymptotically  quadratically 
convergent  and  second  derivatives  are  not  required  to  obtain  this  quadratic  convergence  rate 
in  many  cases.  The  code  uses  a  variant  of  Newton's  method  to  minimize  a  related  nonlinear 
but  essentially  differentiable  function.  In  this  paper,  we  report  results  using  the  Overton 
optimization  code  to  find  optimal  preconditioners  of  a  given  sparsity  pattern. 

To  see  how  the  preconditioning  problem  can  fit  into  this  framework,  we  will  need  a  few 
simple  results.  Given  a  symmetric  positive  definite  matrix  A  and  a  sparsity  pattern  for  the 
symmetric  preconditioner  M ,  we  would  like  to  find  the  matrix  M  of  the  given  form  which 
minimizes  p(/-A/~'A)  or  k(A/_1A),  as  explained  above.  The  matrix  /-Af-1A  is  not  sym- 
metric, and  its  elements  are  not  affine  functions  of  the  elements  of  M,  but  the  following 
theorem  relates  the  minimization  of  p(l-M ~'A)  to  the  minimization  of  the  largest  eigenvalue 
of  /  -  L~XML~T,  where  LLT  is  a  factorization  (e.g.,  the  Cholesky  factorization)  of  A.  We 
start  with  the  following  simple  lemma. 

Lemma  1.  For  a  given  symmetric  matrix  Q  with  nonnegative  eigenvalues,  the  constant  c 
which  minimizes  p(I-cQ)  is 


we)  +  wo ' 

where  \min(£>)  is  the  smallest  and  \mtx(Q)  the  largest  eigenvalue  of  Q. 


raent  of  Energy  under  contract  DE-AC02-76ER03077. 

1  When  referring  to  the  condition  number  k(M_1A),  we  actually  mean  the  ratio  of  largest  to  smallest 
eigenvalue  of  M~ 'A,  or,  the  condition  number  in  the  2-norm  of  W~|/2AW"|/2.  Since  the  matrices  we 
consider  are  all  symmetric  and  positive  definite,  this  should  cause  no  confusion. 
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Proof:   For  c  as  defined  above,  the  spectral  radius  of  1-cQ  is  given  by 

p(I-cQ)  =  1  -  ckmin(Q)  =  c\m„(G)  "   1. 
For  any  other  constant  c'  ¥=c,  one  of  the  following  two  inequalities  must  hold: 
c'<c     -     1  -  c'XmlD(G)  >  1  -  cXmia(G),      or      c'>c     -     c'Xmw(Q)  -  1  >  cXm„(fi)  -  1, 
and  so  the  spectral  radius  of  I-c'Q  must  satisfy 

P(I-CQ)  >  p(/-cQ).  O 

Theorem  1.    Let  A=LLr  be  a  factorization  of  the  symmetric  positive  definite  matrix  A,  and 
let  M[  and  A/2  be  two  symmetric  positive  semi-definite  matrices  satisfying 

prV-^Z.-'A/,/.-^  <  p(/-c2Z.-,A/2Z.-r), 
where  the  constants  C]  and  c2,  minimize  these  spectral  radii,  as  described  in  Lemma  1;  i.e., 

C'  "    Xm,x(L-'M,L-0  +  XmiB(L-'W,L-0  '       ,=  1'2' 
Then  A/j  and  Af2  also  satisfy 

p(f-*,tffU)  <  p(/-c2Af2-U), 
where  Cj  and  c2  minimize  these  spectral  radii;  i.e., 

C'  =    *m.x(*f  *)  +  XmiB(Af -U)  '      '=1,2- 
Proof:  Because  of  the  choice  of  C!  and  c2,  we  have 

./,..,  -i„  ,  -rx  -    Xm„(L-'Af,L-Q  -  \min(I-'M,Z.-Q 

P(/    *     A/'L0-    Xma,(Z.-.M1L-0+Xmin(L-.M,L-0'      '-1'2' 

The  eigenvalues  of  L-IM,L~r,  i=l,2,  are  just  the  inverses  of  the  eigenvalues  of  LTM~]L,  or, 
of  M~*A.    Hence  we  have 

o(l-cL-iML-T)  =    W*r'A)  ~  Xmin(Af-U) 
P(        '  '        )        ^M-'A)  +  XmiB(itffU) ' 

and  because  of  the  choice  of  c,  and  c2,  the  right-hand  side  is  equal  to  the  spectral  radius  of 
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I  -  c,M  ~  ]A,  i=  1,2.    Thus,  we  have 

p(/-c,Af fU)  =  pil-CiL-iMyL-7)    <    p{I-c2L^M1L-T)  =  p{I-c2M2^A).  (J 


The  symmetric  matrix  I-L~XML~T  is  an  affine  function  of  the  elements  of  M.  Theorem 
1  shows  that  if  M,  minimizes  the  spectral  radius  of  this  matrix  over  all  symmetric  precondi- 
tioners  M  belonging  to  som:  set  which  also  contains  all  constant  multiples  of  its  members, 
then  there  is  a  constant  cx  such  that  c,Af ,  minimizes  the  spectral  radius  of  I-M~ ]A  over  this 
set.  The  constant  c,  is  defined  in  Lemma  1.  The  following  theorem  shows  that  this  same 
matrix  M,  also  minimizes  the  ratio  of  largest  to  smallest  eigenvalue,  k(M ~]A). 


Theorem  2.    Let  A=LLT  be  a  factorization  of  the  symmetric  positive  definite  matrix  A,  and 
let  M,  and  M2  be  two  symmetric  positive  semi-definite  matrices  satisfying 

p^-L-'A/,!"7)    <    p(l-c1L-^M2L-T), 

where  the  constant  c2  minimizes  this  spectral  radius;  i.e., 


2        \m„(L-itfaI-0  +  \mw(L-<M2L-T)  • 
Then  M,  and  M2  also  satisfy 

K(Aff'A)  <  K(W2-»i4). 
Proof:  By  the  choice  of  c2  we  have 

p(/     c2L     M2L     )                 kmix(L-iM2L-T)  +  Xmin(L->M2L-0  Xmax(L->M2L"0  +  Xmin(L-'M2L-0 

The  hypothesis  then  implies  that  the  eigenvalues  of  L~^MXL~T  obey 
1-X      r/-iM/-n<i 2Xmin(L-'Af2Z.-Q 


2Xmax(L-'Af2L-Q 
Xmal(L-'Af2L-0  +  Xm,n(L-'M2L-0 


Xro„(Z.-^1L-r)  -  l  <   %       /,.i;^.r,/- "  1  ■ 


or, 


^(L-Uf*-*)  >  *-^**3 


XmIX(L-'Af,L-0  < 


Xmai(L-'M2Z.-0  +  Xm,n(L-'M2L-0  ' 

2Xma>(L-'M2L-Q 
Xm„(L-'Af2L-0  +  Xmm(L-'M2Z.-0' 
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Dividing  the  second  inequality  by  the  first  then  gives 

\max(Z.-'Af,I-r)  Xm»(Z.-'M2L-Q 

and  since  the  eigenvalues  of  L~XM,L~T,  i=l,2,  are  just  the  inverses  of  the  eigenvalues  of 
M~XA,  the  desired  result  follows.  CJ 


In  the  following  examples  we  find  the  matrix  M  in  a  given  set  which  minimizes 
p(/  — L_1AfL_r).  According  to  Theorem  2,  this  also  minimizes  k(M~'A)  over  the  set. 
Theorems  1  and  2,  together  with  the  previous  discussion,  justify  referring  to  this  matrix  (or 
an  appropriate  constant  multiple  of  this  matrix)  as  the  "optimal"  preconditioner  from  the  set 
for  use  with  the  Chebyshev  method  or  with  iterative  refinement.  Other  criteria  are  possible 
for  defining  a  good  preconditioner  for  the  conjugate  gradient  method  [6,21],  but  this  is  a  rea- 
sonable and  simple  measure,  and  if  there  is  a  matrix  M  in  the  set  for  which  this  measure  is 
small,  then  the  conjugate  gradient  method  with  this  preconditioner  will  have  guaranteed  fast 
convergence. 

This  same  idea  of  minimizing  the  condition  number  was  actually  pursued  for  a  certain 
class  of  preconditioners  much  earlier  by  Concus  and  Golub  [5].  They  considered  1-D  model 
problems  and  used  a  code  by  Fletcher  [10]  to  find  optimal  diagonal  scalings  of  the  Laplacian 
to  use  as  a  preconditioner. 

2.    Theoretical  Results. 

There  are  few  theoretical  results  concerning  optimal  preconditioners  for  most  possible 
sparsity  patterns.  An  exception  is  the  case  of  diagonal  and  block  diagonal  preconditioners. 
Van  der  Sluis  [22]  proved  the  following  theorem  about  diagonal  scaling  of  a  symmetric  posi- 
tive definite  matrix  A: 


Theorem  (Van  der  Sluis).    Let  D  be  the  diagonal  of  the  symmetric  positive  definite  matrix  A, 
and  let  D  be  any  other  positive  definite  diagonal  matrix.    Then  k (O  ~ inAD ~ 1/2)  satisfies 

k(D-1/2AD"1/2)  <:  m  k(DAD), 

where  m  is  the  maximum  number  of  nonzeros  in  any  row  of  A. 


Thus,  D  =  diag(A)  approximately  minimizes   k(M    'A)  over  all  diagonal  preconditioners  M. 
When  the  matrix  A  also  possesses  property-A,  a  stronger  result  holds  [11]: 


Theorem  (Forsythe  &  Strauss).    Using  the  above  notation,  if  the  symmetric  positive  definite 
matrix  A  has  property-A,  then  k(D"1/2AD~1/2)  satisfies 


k(D"1/2AD-1/2)  <  k(DAD). 


6  - 


In  this  case,  then,  D  =  diag{A)  is  the  optimal  diagonal  preconditioner  for  the  matrix  A. 

A  generalization  of  the  Van  der  Sluis  theorem  has  also  been  proved  for  block  diagonal 
preconditioners  [7]. 


Theorem  (Demmel).  Let  D  be  the  block  diagonal  of  the  symmetric  positive  definite  matrix 
A,  and  let  D  be  any  other  symmetric  positive  definite  block  diagonal  matrix  with  the  same 
size  blocks.    Then  K.(D~mAD~in)  satisfies 


k(D"1/2AD-1/2)  ^  b  k(DAD), 


where  b  is  the  number  of  blocks  in  D. 


A  result  similar  to  that  of  Forsythe  and  Strauss  has  also  been  proved  for  block  diagonal 
preconditioners  [9],  when  the  matrix  A  is  block  2-cyclic  and  is  permuted  into  the  form 


A    = 


D,    CT 
C    D-, 


D, 


0 


0 
D:2 


0       0 


i=l,2. 


(2.1) 


Theorem  (Eisenstat, Lewis, Schultz).    Let  A  be  of  the  form  (2.1),  and  let  D  be  the  block  diag- 
onal matrix  whose  diagonal  blocks  are  {D{i,  ■  •  •  ,Dl  r  ,D2x,...,Di  rJ.    Let  D  be  any  other 

block  diagonal  matrix  with  the  same  size  blocks.   Then  K(D~1/2AD~]n)  satisfies 

K(D-xrlAD-m)  S  k(DAD). 


There  appears  to  be  little  known  theoretically  about  optimal  preconditioners  of  more 
general  sparsity  patterns;  e.g.,  tridiagonal  or  banded.  There  is,  however,  a  useful  theorem 
due  to  Varga  [23]  for  comparing  "regular  splittings",  which  in  some  cases,  enables  one  to 
determine  the  optimal  preconditioner  among  all  regular  splittings  of  a  given  sparsity  pattern. 

Definition.    For  n  by  n  real  matrices  A,  M,  and  N,  A=M-N  is  a  regular  splitting  of  the 
matrix  A  if  M  is  nonsingular  with  M"'&0,  and  iV^O. 

Theorem   (Varga).     Let    A  =  M^  -  N \  =  M2  -  N2    be  two  regular  splittings  of  A,  where 
i4"'>0.    If  #2^,2=0,  (and  neither  Nx  nor  N2-Ni  is  the  null  matrix),  then 

1  >  p(A/2-'N2)  >  p(Af fW,)  >  0. 


This  theorem  implies  that  among  all  regular  splitting  matrices  M  of  a  given  bandwidth, 
for  example,  the  optimal  one  for  minimizing  the  spectral  radius  of/  —  M~]A  is  M  =  band(A). 
This  matrix  is  closer  clement-wise  to  A  than  is  any  other  member  of  the  class,  and  so,  by  the 
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theorem,  is  a  better  splitting,  or,  preconditioner.  As  will  be  shown  in  the  following  exam- 
ples, however,  there  may  be  better  banded  preconditioners  outside  the  class  of  regular  split- 
tings. This  theorem  is  quite  general  and  important  to  remember.  Unfortunately,  however, 
many  of  the  most  effective  preconditioners  are  not  regular  splittings,  and  so  it  is  of  limited 
applicability. 

It  should  also  be  noted  that  because  the  set  of^  regular  splittings  does  not  contain  all 
constant  multiples  of  its  members,  the  matrix,  say,  M,  from  some  class  of  regular  splittings, 
which  minimizes  p(/-A/~1A)  over  that  class  does  not  necessarily  minimize  k(M~]A).  The 
hypotheses  of  Varga's  theorem,  together  with  the  assumption  that  A,  M ,,  and  M2  are  sym- 
metric and  positive  definite,  do  not  imply  *(A/f  'A)  :£  k(M2~]A).  Only  if  the  matrices  M,  and 
c2M2  "  where  c2  minimizes  p(/-cM2_1A),  as  explained  in  Lemma  1  -  only  if  these  matrices 
satisfy  the  hypotheses  of  Varga's  theorem  would  it  also  follow,  from  Theorem  2,  that 
k(A/j~'A)  s.  k(M2_1A).  With  a  weaker  assumption  about  the  matrix  M2,  however,  it  can  be 
shown  that  ic(Mf  'A)  <  2  k(M2~]A). 

Theorem  3.  Let  A,  Mx,  and  M2  be  symmetric  positive  definite  matrices  satisfying  the 
hypotheses  of  Varga's  theorem,  and  suppose  the  largest  eigenvalue  of  A/f 'A  is  greater  than 
or  equal  to  1.  (This  would  be  the  case,  for  example,  if  A  and  M2  have  at  least  one  diagonal 
element  equal,  since  then  the  symmetric  matrix  A^  would  have  a  zero  diagonal  element  and 
hence  M2XN2  would  have  a  zero  eigenvalue.)  Then  the  ratios  of  largest  to  smallest  eigen- 
values of  Mf'A  and  Mf'A  satisfy 

k(M,"M)  <  2  k(W2-'A). 

Proof:  Since  the  elements  of  M2XN2  are  nonnegative,  the  Perron-Frobenius  theorem  states 
that  its  spectral  radius  is  equal  to  its  (algebraically)  largest  eigenvalue: 

p(A/2"W2)  =  p(/-A/2-M)  =  1  -  Xmm(M2-'A). 

The  result  p(Af  \lN{)  <  p(Af2-1W2)  from  Varga's  theorem  implies 

1  "  W*rfA)  <  1  -  ^,«(Wf'1)       and      Xmal(Mf'A)  -  1<  1  -  \mw(M2^A), 

or,  equivalently, 

\miD(Aff>A)  >  Xmm(M2-'A)       and      Xm„(JHfM)  <  2  -  \m,n(M2-'A). 

Dividing  the  second  inequality  by  the  first  gives 

K(Mf'A)  <  K(Af2-'A)'    2  7  X7°(M,2"!'4)    <  K(Aff'A)'  2, 

Xmax(«2    A) 

with  the  last  inequality  holding  because  Xma,(Af2_1A)  a  1,  and  Xm;n(M2_1A)  >  0  since 
p(M2-W2)  <  1.  O 


A  somewhat  more  general  result  can  be  proved  for  symmetric  matrices. 

Definition.  A  splitting  A  =  M  -  N  is  said  to  be  a  weak  regular  splitting  of  the  matrix  A  if  M 
is  nonsingular  with  M"'JV^0. 

We  then  get  the  following  comparison  theorem. 

Theorem  4.  Suppose  A  and  M  are  symmetric  positive  definte  matrices  such  that  A  =  M  -  N 
is  a  weak  regular  splitting  of  A.  Let  Mq  =  M  +  Q  for  some  symmetric  matrix  Q  such  that 
vTQv  £:  0  whenever  v  >  0.    Then 

p(M^N)^p(M^N0) 
where  A  =  Mn  -  No- 


Proof:  Let  Xmin(A/~'A)  and  \mia(MQlA)  denote  the  smallest  eigenvalues  of  M~]A  and  of 
Mq*A,  respectively.  Since  M~lN  has  nonnegative  elements,  the  Perron-Frobenius  theorem 
states  that  its  spectral  radius  is  equal  to  its  (algebraically)  largest  eigenvalue.    Hence  we  have 

p(M-W)  =  p(/-Af"'A)  =  1  -  Xmm(Af-'A). 

If  v  is  the  eigenvector  corresponding  to  the  largest  eigenvalue  of  M~lN,  then  the  Perron- 
Frobenius  theorem  also  states  that  the  elements  of  v  are  positive.  Hence  Xmm(Af ~'A)  also 
satisifes 

XmiD(A/->A)  =  ^AL  *  jy  *  miD  -^-  =  X^A^U), 

v'A/v         v'Afv  +  v'Qv         v*o    v'Mqv 

and  from  this  the  desired  result  follows: 

p(MqINq)  =  p{I-Mq1A)  £  1  -  Xmin(Me'A)  ^  1  -  Xmin(M-'A)  =  p(Af-'rV).  {J 


Other  work  on  iterative  methods  has  focussed  on  preconditioners  that  are  "optimal"  in  a 
different  sense  from  that  being  considered  here  [e.g.,  2].  Mulrigrid  methods,  for  example, 
are  "optimal"  in  the  sense  that  the  preconditioned  matrix  has  condition  number  0(1), 
independent  of  the  size  mesh  from  which  the  linear  system  was  obtained  (assuming  that  the 
linear  system  comes  from  a  finite  element  approximation  to  an  elliptic  partial  differential 
equation  with  sufficiently  smooth  solution).  In  this  paper,  to  be  considered  "optimal",  a 
preconditioner  of  the  form  of  the  multigrid  preconditioner,  must  not  only  give  a  condition 
number  0(1),  but  the  constant  must  be  as  small  as  possible,  too. 

3.    Experimental  Results  for  the  5-Point  Laplacian. 

In  the  following  sections  we  report  experimental  results  using  the  Overton  optimization 
code  [20]  to  find  the  matrix  M  of  a  given  form  which  minimizes  p(I-L~]ML~T),  for  a  given 
matrix  A  =  LLT.  According  to  theorems  1  and  2,  an  appropriate  constant  multiple  of  this 
matrix  also  minimizes  p(/-M_1A)  and  tc(Af  ~'A)  over  all  matrices  M  of  the  given  form.  The 
matrix   A    was   taken    to    be   the   5-point   Laplacian   on   a   square   with    Dirichlet   boundary 


conditions: 
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A  = 


£f  D2 


(3.1) 


Dt 


4      -1 
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-1      4 


.    £< 


-1 


-1 


,/=!,.  ..,m. 


In  most  cases,  the  matrix  M  was  taken  to  have  a  fixed  sparsity  pattern;  e.g.,  diagonal  or 
tridiagonal.    The  matrix  l-L~xML~r  can  then  be  expressed  in  the  form 


/  -  L~lML-T  =  / 


£2  wt,  v<*-«> 


where  the  sum  is  over  all  indices  (k,q)  such  that  Mkq  is  allowed  to  be  nonzero,  and  the 
matrices  y(*.?>  are  given  by 

(V(*.*>)„  =  (t-')ft(L-');,,     «J=l,...,n. 

The  matrices  /  and  -  V<*-?>,    (*,<?)€  {indices  of  free  elements  of  M)  are  the  input  to  the  optim- 
ization code. 

In  some  cases,  slightly  different  forms  for  M  were  considered.  For  example,  in  one 
experiment,  we  found  the  matrix  M_1  having  the  same  sparsity  pattern  as  A,  for  which 
p(l  —  LTM~xL)  was  minimal.  This  is  equivalent  to  minimizing  p(/-A/"'A),  and  from 
Theorem  2,  this  also  minimizes  k(A/_1A)  over  all  matrices  M"1  with  the  given  sparsity  pat- 
tern. The  elements  of  LTM~XL  are  again  linear  functions  of  the  free  elements  of  M -1,  and  so 
p(I-LTM~]L)  can  be  minimized  with  the  same  optimization  code. 

Another  preconditioning  problem  considered  was  one  involving  not  A  but  the  Schur 
complement  C  in  A  of  a  block  corresponding  to  a  dividing  line  in  the  center  of  the  square 
domain: 


If  nodes  in  fi,  are  numbered  first,  then  nodes  in  fl2,  and  finally  nodes  on  the  boundary  r3, 
then  the  matrix  A  takes  the  form 
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A  = 


ffll 

0     ATI3 

0 

*22    ^23 

Kh 

^23    ^33 

(3.2) 


The  Schur  complement  of  #33  in  A  is  defined  as 

C  =  tf33  -  JrfcrfilJTu  -  K^K^K^  ,  (3.3) 

and  the  problem  of  solving  a  linear  system  with  coefficient  matrix  A  can  be  reduced  to  one  of 
solving  a  smaller  linear  system  with  coefficient  matrix  C.  This  is  the  basis  of  many  domain 
decomposition  methods  [3].  One  still  needs  a  good  preconditioner  for  the  matrix  C,  and 
here  we  considered  preconditioners  with  a  given  sparsity  pattern;  e.g.,  tridiagonal,  as  well  as 
certain  other  forms;  e.g.,  Toeplitz.  Again,  the  matrix  C  was  factored  in  the  form  LLT ,  and 
the  function  p(/-L_1ML_r)  was  minimized.  This  is  a  convex  function  of  the  nonzero  ele- 
ments of  M  (when  M  is  restricted  to  have  a  certain  sparsity  pattern)  or  of  the  values  on  each 
diagonal  of  M  (when  M  is  restricted  to  be  Toeplitz). 

Many  of  the  most  efficient  preconditioners  M  are  easy  to  solve  not  because  they  them- 
selves have  a  special  sparsity  pattern,  but  because  they  are  of  the  form  KKT,  where  K  is  a 
lower  triangular  matrix  with  a  simple  sparsity  pattern;  e.g.,  that  of  the  lower  triangle  of  A. 
The  problem  of  finding  the  matrix  K  of  a  given  form  which  minimizes  p(/-L_1^ArrZ._r)  is 
more  difficult  than  the  previously  described  problems,  however,  because  the  matrix 

/  -  L'^KKTL-T  (3.4) 

is  not  an  affine  function  of  the  elements  of  K.  Moreover,  its  spectral  radius  is  not  a  convex 
function  of  K ,  and  the  function  may  have  local  minima. 

The  optimization  code  is  easily  modified  to  handle  the  case  of  matrix  elements  which 
are  nonlinear  functions  of  the  unknowns,  but  there  is  no  guarantee  that  the  solution  it  finds 
will  be  the  global  minimum.  Still,  it  will  be  shown  in  the  numerical  examples  that  the  optimi- 
zation code  is  able  to  find  preconditioners  of  the  form  (3.4),  where  K  has  a  fixed  sparsity 
pattern,  that  are  significantly  better  than  many  currently  used  preconditioners.  We  con- 
sidered matrices  K  having  the  same  sparsity  as  the  lower  triangle  of  A,  and  compared  the 
preconditioner  KKT  returned  by  the  optimization  code  with  the  incomplete  Cholesky  decom- 
position [19],  the  modified  incomplete  Cholesky  decomposition  [15],  and  the  SSOR  precondi- 
tioner [25].  We  also  considered  matrices  K  having  the  same  sparsity  pattern  as  the  lower  tri- 
angular factor  of  the  hierarchical  basis  function  preconditioner  [26]  and  compared  the  condi- 
tion number  for  this  algorithm  with  the  locally  minimized  condition  number  returned  by  the 
optimization  code. 

Experiments  that  have  been  carried  out  so  far  are  for  very  small  problems.  It  is 
planned  to  continue  this  work  on  larger  problems  when  the  optimization  code  has  been 
ported  to  larger  and  faster  machines.  It  should  be  stressed  that  this  is  not  meant  as  a  practi- 
cal means  for  finding  a  good  preconditioner  for  a  given  problem.  It  is  much  easier  to  solve 
the  linear  system  than  it  is  to  find  the  optimal  preconditioner  of  a  given  class.  Rather,  the 
optimization  code  is  meant  to  provide  insight  into  the  properties  of  preconditioners.  If  the 
optimal  preconditioner  of  a  certain  form  for  a  given  problem  does  not  give  rise  to  a  precon- 
ditioned matrix  with  small  condition  number,  then  it  is  not  worthwhile  considering  precondi- 
tioners of  that  form  (unless  such  preconditioners  can  exhibit  other  desirable  properties,  such 
as  tight  clustering  of  most  of  the  eigenvalues).  On  the  other  hand,  if  the  code  shows  that 
there  is  a  good  preconditioner  of  the  given  form,  then  it  still  may  or  may  not  be  possible  to 
compute  such  a  preconditioner  in  a  reasonable  amount  of  time. 
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3.1.  Diagonal  Preconditioning  for  A. 

As  a  check  on  the  code,  we  first  tried  computing  the  optimal  diagonal  preconditioner 
for  the  matrix  A  of  (3.1).  According  to  the  Forsythe  and  Strauss  theorem  of  Section  2,  this 
is  simply  D  =  diag(A).  Indeed,  even  from  very  far  off  initial  guesses,  the  code  always  con- 
verged to  M  =  diag(A)  and  indicated  that  it  had  successfully  found  the  minimum.  This  gave 
us  confidence  to  try  problems  for  which  the  answers  were  not  known. 

3.2.  Tridiagonal  Preconditioning  for  A. 

By  numbering  the  odd  block-rows  and  block-columns  first  and  the  even  ones  last,  the 
matrix  A  of  (3.1)  can  be  permuted  into  block  2-cyclic  form,  without  changing  the  diagonal 
blocks.  It  follows  from  the  Eisenstat,  Lewis,  Schultz  theorem  of  Section  2,  that  the  optimal 
block  diagonal  preconditioner  for  A  is  D  =  block  diag(A).  This  matrix  is  also  tridiagonal. 
The  optimization  code  was  used  to  compute  the  best  tridiagonal  preconditioner  for  A ,  and  it 
was  found  to  be  slightly  better  than  the  block  diagonal  matrix  D.  Fig.  1  shows  condition 
numbers  for  A,  D~XA,  and  M~lA,  where  M  is  the  optimal  tridiagonal  preconditioner,  plotted 
against  h~2.  From  the  figure,  it  appears  that  all  of  these  matrices  have  condition  number 
0(h~2),  and  based  on  these  results  we  make  the  following  conjecture: 

Conjecture  1.  Let  Ah  be  the  5-point  Laplace  matrix  of  (3.1),  for  grid  size  h,  and  let  Mh  be 
any  symmetric  positive  definite  tridiagonal  preconditioner  for  Ah.  Then  the  condition  number 
K(MhlAh)  of  the  preconditioned  matrix  satisfies 

k(M„->A„)  ^  0(A-2)  . 


Based  on  these  results  we  conclude  that  tridiagonal  preconditioners  for  A  cannot  be 
very  effective,  in  terms  of  giving  a  small  condition  number,  for  small  values  of  h.  Additional 
experiments  were  performed  to  find  optimal  preconditioners  of  slightly  larger  bandwidths, 
but  again  they  appeared  to  give  condition  numbers  for  the  preconditioned  matrix  that  were 
O(h-i). 

The  elements  of  the  optimal  tridiagonal  preconditioner  for  different  values  of  h  are 
listed  in  Table  1.  The  tridiagonal  matrix  given  in  the  table  has  been  multiplied  by  an 
appropriate  constant  so  that  it  minimizes  p(/-Af ~U)  (although  the  code  actually  computes 
the  tridiagonal  matrix  which  minimizes  p(/-A"'A/)).  Note  that  the  preconditioners  are  per- 
symmetric  (symmetric  about  their  northeast  to  southwest  diagonal),  as  would  be  expected 
due  to  the  symmetry  of  the  problem.  This  restriction  was  not  enforced  on  the  class  of 
matrices  from  which  the  optimal  one  was  to  be  found,  and  so  it  gives  further  evidence  that 
the  optimization  code  is  functioning  correctly.  Also  given  in  Table  1  are  some  of  the  eigen- 
values of  the  preconditioned  iteration  matrices  I-M~lA.  Note  that  for  all  problem  sizes,  the 
largest  and  smallest  eigenvalues  of  the  preconditioned  iteration  matrix  each  have  multiplicity 
2.    Based  on  this  evidence  we  make  the  following  conjecture: 

Conjecture  2.  Let  Ah  be  the  5-point  Laplace  matrix  of  (3.1)  for  grid  size  h,  and  let  Mh  be  the 
optimal  tridiagonal  preconditioner  for  Ah.  If  A.,<\2£  •  •  •  s\n_,sX.n  are  the  eigenvalues  of 
Mh]Ah,  then  the  largest  and  smallest  eigenvalues  satisfy 

X,  =  X.2  ,      ^n-l  =  ^n  • 
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h=l/5 

h=l/6 

h=in 

eigenvalues 

eigenvalues 

eigenvalues 

-0.634,  -0.634,  -0.255, 

-0.732,  -0.732,  -0.406, 

-0.798,  -0.798,  -0.539, 

-0.230,  0.020,  0.070,... 

-0.400,  -0.034,  -0.031,... 

-0.508,  -0.188,  -0.182,... 

0.314,  0.375,  0.437, 

0.507,  0.575,  0.593, 

0.671,  0.679,  0.692, 

0.487,  0.634,  0.634 

0.610,  0.732,  0.732 

0.699,  0.798,  0.798 

diagonal 

subdiagonal 

diagonal 

subdiagonal 

diagonal 

subdiagonal 

4.4111 

-1.2754 

4.5383 

-1.3198 

4.6009 

-1.3366 

4.7698 

-1.6842 

5.1029 

-1.8296 

5.2721 

-1.8816 

5.0153 

-1.8785 

5.7456 

-2.3460 

6.2098 

-2.5799 

5.2966 

-1.1143 

6.1396 

-2.4573 

7.0826 

-3.0903 

4.9314 

-1.5875 

6.3366 

-1.4563 

7.5027 

-3.1271 

4.9285 

-1.6675 

5.6482 

-1.9397 

7.5623 

-1.8487 

4.9806,. 

-1.7323A 

5.6718 

-2.0424 

6.4873 

-2.3507 

5.1583  ] 

-1.2442    J 

5.8645 

-2.2516 

6.4556 

-2.4075 

5.1583  J 

-1.7323   / 

6.0232 

-2.2930 

6.7498 

-2.7103 

y 

y 

6.2313 

-1.7199 

7.1391 

-2.9591 

. 

6.0631 

-2.1707- 

7.3354 

-2.9480 

5.9260A 

-2.1821  >j 

7.4991 

-2.2456 

5.9285  1 

-2.1821    J 

7.1410 

-2.6953 

5.9260/ 

y 

7.0021 

7.0845 

7.1929 

7.2359w 

7.3990   J 

7.3990y 

-2.7208 

-2.8095 

-2.8771* 

-2.8434^ 

-2.3548    I 

-2.8434/ 

Table  1.    Optimal  Tridiagonal  Preconditioners  for  A. 


3.3.    Approximation  to  A    '  with  Same  Sparsity  as  A. 

Upon  computing  the  inverse  of  the  matrix  A  of  (3.1),  one  finds  that  the  larger  elements 
of  the  inverse  are  on  or  near  diagonals  in  which  A  has  nonzeros.  It  therefore  seems  reason- 
able to  consider  an  approximation  M_1  to  A-1,  which  has  nonzero  entries  only  in  the  diago- 
nals where  A  has  nonzeros.  If  an  effective  preconditioner  of  this  form  could  be  found,  it 
would  result  in  a  highly  vectorizable  or  parallelizable  algorithm,  since  "solving"  the  precondi- 
tioner would  now  just  mean  multiplying  by  a  sparse  matrix.  The  optimization  code  was  used 
to  find  the  matrix  M~]  having  nonzeros  only  in  the  five  diagonals  where  A  has  nonzeros,  for 
which  k(M ~]A)  was  as  small  as  possible. 

This  turned  out  to  be  a  more  difficult  problem  for  the  optimization  code,  than  either  of 
the  previous  two  cases.  Using  a  very  coarse  mesh,  /»=l/3,  the  optimization  code  returned  a 
solution  with  the  following  caveat 


13 


Apparently  optimal  (or  very  nearly  optimal)  with  non-unique  solution 
(since  a  Lagrange  matrix  nearly  singular) 

The  eigenvalues  of  the  matrix  /-Af_1A,  for  the  computed  matrix  Af_1,  were  all  equal  in  mag- 
nitude, with  three  being  positive  and  one  negative.  Upon  restarting  the  code  from  a  different 
initial  guess,  it  returned  with  the  same  warning  message  but  a  different  optimal  Af  _1.  For 
this  newly  computed  matrix  M~x,  the  eigenvalues  of  l-M ~'A,  were  again  all  equal  in  magni- 
tude, and  of  the  same  magnitude  as  those  previously  computed,  but  this  time  three  were 
negative  and  one  positive.  The  elements  of  the  two  different  optimal  matrices  M~]  are  given 
in  Table  2. 

For  smaller  values  of  h,  the  problem  appears  to  have  a  unique  solution.  The  code  was 
able  to  find  and  identify  as  such  the  optimal  A/-1,  for  h  =1/5.  For  /i  =  1/4,  1/6,  and  1/7, 
attempts  at  finding  the  minimum  resulted  in  the  code  halting  with  the  message 

radius  too  small 

indicating  that  its  trust  region  radius  had  been  reduced  below  the  machine  precision  and  it 
had  been  unable  to  find  a  descent  direction.  Restarting,  in  the  case  /i=l/4,  resulted  in  the 
code  finding  approximately  the  same  "solution",  but  this  time  giving  the  message  "Apparently 
optimal  (or  very  nearly  optimal)  with  non-unique  solution."  In  the  case  h  =  Vt,  several  res- 
tarts resulted  in  the  code  finding  approximately  the  same  "solution",  but  still  halting  because 
the  trust  region  radius  was  too  small.  A  restart  in  the  case  h  =1/7,  however,  resulted  in  the 
code  finding  a  significantly  different  approximation  and  identifying  it  as  optimal.  The  spec- 
tral radius  of  I-M ~'A  for  this  "truly"  optimal  M~x  was  .680,  compared  to  .685  for  the  A/"1 
at  which  it  stopped  the  first  time  because  of  a  too  small  radius.  It  is  believed  that  the  spec- 
tral radii  returned  by  the  code,  for  the  different  sizes  of  h,  are  all  near  optimal,  though  the 
actual  matrices  M_1  may  be  significantly  further  from  the  optimal  ones. 

An  approximation  to  A-1  that  has  been  suggested  in  the  literature  [8,16]  is  the  follow- 
ing.   If  we  write  A  in  the  form 


A  =  DI/2(/-G)D 


1/2 


where  D  is  a  diagonal  matrix,  and  if  the  spectral  radius  of  G  is  less  than  1  (which  it  is  for  this 
problem),  then  A-1  is  given  by 


,4-1  =  £>-i^(/  +  G  +  G2+...)Z>-1/2. 

An  approximation  to  A-1  is  obtained  by  retaining  just  a  finite  number  of  terms  in  the  infinite 
Neumann  series  above.  The  approximation  can  be  improved  by  multiplying  each  term 
retained  by  an  appropriate  constant  [16].  If  only  one  term  is  retained,  then  this  approxima- 
tion has  the  same  sparsity  pattern  as  A.    Thus,  A-1  can  be  approximated  by 

Mf'  =  D-^(c0/  +  c1G)D-'/2.  (3.3.1) 

When  A  is  the  5-point  Laplacian,  the  optimal  constants  are  just  <r0  =  c,=  1. 

The  condition  number  of  A,  that  of  A/f'A,  where  Af  f1  is  defined  by  (3.3.1),  and  that  of 
Af-1A,  where  Af _1  is  the  optimal  preconditioner  having  nonzeros  only  in  diagonals  where  A 
has  nonzeros  (or,  at  least,  the  preconditioner  returned  by  the  optimization  code)  are  plotted 
in  Fig.  2.    Note  that  the  condition  number  of  Af  f'A  is  almost  the  same  as  that  of  Af-1A  for 
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all  values  of  h\  It  should  be  noted,  however,  that  in  cases  where  the  code  was  able  to  find  a 
matrix  which  it  identified  as  optimal,  it  was  significantly  different  from  the  matrix  in  (3.3.1). 
The  condition  number  of  the  preconditioned  matrix  was  very  nearly  the  same,  but  the  eigen- 
values of  the  optimally  preconditioned  matrix  tended  to  cluster  somewhat  more  at  the  ends 
and  less  in  the  middle.  For  the  /i=l/5  case,  for  example,  for  the  optimal  Af~',  the  matrix 
l-M~xA  had  5  eigenvalues  equal  to  -.474,  3  eigenvalues  equal  to  +.474,  and  no  repetitions 
among  the  remaining  interior  eigenvalues.  Using  the  matrix  of  (3.3.1),  /-Mf'A  had  4  eigen- 
values equal  to  -.486,  4  equal  to  -.393,  2  equal  to  -.344,  4  equal  to  -.022,  and  2  equal  to 
+  .486. 

It  is  known  that  a  polynomial  preconditioner  of  the  form  (3.3.1)  can  reduce  the  number 
of  conjugate  gradient  steps  (over  the  number  required  with  no  preconditioner)  by  no  more 
than  a  factor  of  2  (one  plus  the  degree  of  the  polynomial  in  G)  [8].  It  is  also  known  that 
when  A  is  the  5-point  Laplacian,  for  small  values  of  h,  the  condition  number  of  the  matrix 
A/fU  is  approximately  1/4  that  of  A.  This  is  not  apparent  from  Fig.  2,  however,  because 
only  very  coarse  grid  sizes  are  shown.  Because  the  asymptotic  behavior  of  preconditioner 
(3.3.1)  cannot  be  predicted  from  the  figure,  we  are  wary  of  predicting  the  behavior  of  the 
optimally  preconditioned  matrix,  based  on  these  results.  From  the  figure,  it  would  appear 
that  the  condition  number  of  the  optimally  preconditioned  matrix  is  still  0(h~2),  but  testing 
on  larger  problems  is  needed  to  see  if  this  trend  continues. 


eigenvalues 
-1.429,  1.429,  1.429,  1.429 

eigenvalues 
-1.429,  -1.429,  -1.429,  1.429 

diag 

1st  subdiag 

2nd  subdiag 

diag 

1st  subdiag 

2nd  subdiag 

2.857e-01 
2.500e-01 
2.500e-01 
2.857e-01 

7.143e-02 
3.571e-02 
7.143e-02 

7.143e-02 
7.143e-02 

2.857e-01 
3.214e-01 
3.214e-01 
2.857e-01 

7.143e-02 
3.571e-02 
7.143e-02 

7.143e-02 
7.143e-02 

Table  2.   Two  "Optimal"  Matrices  M~x  with  Same  Sparsity  Pattern  as  A  (A  =  l/3). 


One  experiment  was  performed  in  which  the  approximation  to  A~x  was  allowed  to  have 
nonzeros  in  extra  diagonals  —  the  same  diagonals  in  which  the  second  degree  polynomial 
preconditioner 


Mf1  =  erf  +  CjG  +  c2G2 


has  nonzeros.  In  this  case,  the  code  again  stopped  with  the  message  "radius  too  small,"  so  it 
is  not  known  how  close  it  came  to  finding  the  optimal  preconditioner  of  the  given  sparsity 
pattern.  Still,  the  matrix  M~l  returned  by  the  optimization  code  was  considerably  better  than 
the  polynomial  preconditioner  M{x  with  the  optimal  coefficients,  c0,  Cj,  c2.  For  h=\/S,  the 
spectral  radius  of  I-M~XA  was  .208,  compared  to  .260  for  I-M£XA. 
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3.4.    Freconditioners  for  the  Schur  Complement  C. 

In  this  set  of  experiments  we  found  optimal  preconditioned  of  various  forms  for  the 
Schur  complement  matrix  C  defined  in  (3.3).  The  condition  number  of  the  matrix  C  is 
0(/t_1)  and,  in  fact,  its  entire  eigendecomposition  can  be  derived  using  Fourier  analysis  [3]. 
Based  on  this,  one  can  determine  analytically  the  optimal  Toeplitz  tridiagonal  preconditioner 
[24,4]  (since  any  Toeplitz  tridiagonal  matrix  has  the  same  eigenvectors  as  C),  and  it  can  be 
shown  that  this  preconditioned  matrix  has  condition  number  0(h~17).  With  the  optimization 
code,  we  were  able  to  determine  the  optimal  tridiagonal  preconditioner,  without  requiring 
that  it  be  Toeplitz.  This  turned  out  to  be  a  relatively  easy  problem  for  the  optimization  code, 
with  the  code  returning  a  solution  which  it  identified  as  optimal,  for  all  values  of  h  tested.  It 
was  observed,  however,  that  especially  for  the  smaller  values  of  h,  the  optimal  tridiagonal 
preconditioner  for  C  was  very  nearly  Toeplitz.  In  Fig.  3  are  plotted  the  condition  numbers 
of  C,  of  K^C ,  where  KJ3  is  as  defined  in  (3.2),  of  T'XC ,  where  T  is  the  optimal  Toeplitz  tri- 
diagonal preconditioner  for  C,  and  of  M  ~'C,  where  M  is  the  optimal  tridiagonal  precondi- 
tioner for  C,  returned  by  the  optimization  code.  Note  that  the  curves  for  k(7"_1C)  and  for 
k(M"'C)  are  almost  identical.  In  Fig.  3b,  these  two  curves  only  are  plotted  versus  h~ia. 
Based  on  this  evidence,  we  make  the  following  conjecture: 

Conjecture  3.  Let  Ch  be  the  Schur  complement  matrix  corresponding  to  a  center  dividing 
line  for  the  5-point  Laplace  matrix  on  a  grid  of  size  h,  as  defined  in  (3.3).  Let  Mh  be  any 
symmetric  positive  definite  tridiagonal  preconditioner  for  Ch.  Then  the  condition  number 
K(M^lCh)  of  the  preconditioned  matrix  satisfies 


The  next  experiment  was  to  find  the  optimal  (dense)  Toeplitz  preconditioner  for  C. 
This  turned  out  to  be  a  very  difficult  problem  for  the  optimization  code  because  of  very  tight 
clustering  of  many  eigenvalues  at  the  extremes.  The  first  attempt  at  solving  this  problem  for 
most  values  of  h  resulted  in  the  code  halting  with  the  message  "radius  too  small,"  as 
described  previously.  Restarting  the  code  with  slightly  different  options,  however,  resulted 
in  its  identifying  essentially  the  same  "solution"  as  optimal. 

The  condition  number  of  the  optimally  preconditioned  matrix  is  plotted  in  Fig.  4,  along 
with  that  of  the  matrix  preconditioned  by  a  Toeplitz  preconditioner  proposed  by  Golub  and 
Mayers  [12].  Although  there  is  a  large  relative  difference  in  these  condition  numbers,  both 
are  so  small  that  this  difference  is  not  really  significant.  The  eigenvalues  of  both  precondi- 
tioned matrices  are  plotted  in  Figs.  4b-c,  for  the  case  A  =  1/16.  For  both  preconditioners 
there  is  considerable  clustering  of  the  eigenvalues,  but  it  is  especially  pronounced  for  the 
Toeplitz  preconditioner  returned  by  the  optimization  code. 


3.5.    Preconditioners  of  the  Form  KKT,  Where  K  Has  the  Same  Sparsity  as  the  Lower  Tri- 
angle of  A. 

Several  well-known  preconditioners  —  e.g.,  the  incomplete  Cholesky  (IC)  and  modified 
incomplete  Cholesky  (MIC)  decompositions  [19,15],  and  the  symmetric  successive  overtaxa- 
tion (SSOR)  preconditioner  [25]  —  are  of  the  form  KK7 ,  where  K  is  a  lower  triangular  matrix 
with  the  same  sparsity  pattern  as  the  lower  triangle  of  A.  As  mentioned  earlier,  the  problem 
of  finding  the  optimal  preconditioner  of  this  form  is  not  one  of  minimizing  a  convex  function, 
and  so  the  "solution"  returned  by  the  optimization  code  may  not  be  the  global  minimum. 
The  code  can,  however,  be  used  to  find  a  local  minimum  for  the  spectral  radius  of  (3.4), 
when  K  is  restricted  to  have  the  same  sparsity  pattern  as  the  lower  triangle  of  A.  This  also 
turned  out  to  be  a  difficult  problem  for  the  optimization  code,  and  in  all  cases,  it  reached  a 
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point  at  which  it  could  go  no  further  because  its  trust  region  radius  had  been  reduced  below 
the  precision  of  the  machine.  Still,  the  preconditioner  KKT  returned  by  the  optimization  code 
was  significantly  better  than  those  proposed  in  the  literature.  Condition  numbers  for  the 
matrix  preconditioned  by  the  IC  decomposition,  the  MIC  decomposition,  and  the  SSOR 
preconditioner  are  plotted  in  Fig.  5,  along  with  that  of  the  matrix  preconditioned  by  KKT, 
where  K  is  the  matrix  returned  by  the  optimization  code.  It  remains  to  be  seen  how  much 
further  this  condition  number  can  be  reduced. 


3.6.    Preconditioners  of  the  Form  KKT,  Where  K  has  the  Same  Sparsity  as  the  Lower  Tri- 
angular Factor  of  the  Hierarchical  Basis  Function  Preconditioner. 

A  recently  proposed  idea  for  reducing  the  condition  number  of  the  matrix  A  of  a  finite 
element  approximation  is  to  use  different  basis  functions  to  represent  the  finite  element 
space.  One  such  idea  --  the  hierarchical  basis  function  representation  [26]  --  yields  a  matrix  A 
whose  condition  number  is  0(log  h'1)2  rather  than  0(h'2)  for  the  standard  nodal  basis 
representation.  This  idea  can  also  be  thought  of  as  solving  the  original  nodal  basis  matrix  A 
using  a  special  preconditioner.  The  preconditioner  is  of  the  form  KKT ,  where,  if  nodes  are 
numbered  appropriately,  K  is  a  lower  triangular  matrix  with  a  certain  sparsity  pattern  based 
on  the  hierarchy  of  grids  and  the  type  of  finite  element.  The  5-point  operator  A  that  we  are 
considering  corresponds  to  a  linear  finite  element  approximation  on  a  regular  triangular  grid. 


■ 


A  hierarchy  of  grids  and  basis  functions  for  the  space  of  continuous  piecewise  linear 
functions  on  the  given  grid  can  be  defined  as  follows.  Start  with  a  coarsest  grid  which  has 
only  one  interior  node  and  take  the  first  basis  function  to  be  linear  in  each  zone  of  this  coar- 
sest grid  and  to  have  value  one  at  the  interior  node  and  zero  at  all  the  boundary  nodes: 


Divide  each  triangle  of  this  grid  into  four  congruent  triangles  by  connecting  the  midpoints  of 
each  side.  For  each  new  interior  node,  take  a  basis  function  that  is  linear  in  each  zone  of  this 
finer  grid  and  has  value  one  at  that  node  and  zero  at  all  other  nodes  of  this  finer  grid. 


/ 

/ 

V 

/ 

Create  an  even  finer  grid  by  connecting  the  midpoints  of  the  sides  of  each  of  these  triangles 
and  adding  basis  functions  similarly  for  each  new  interior  node.  Continue,  until  the  finest 
level  grid  is  reached.  Any  continuous  piecewise  linear  function  on  this  finest  grid  can  be 
represented  as  a  linear  combination  of  these  hierarchical  basis  functions, 
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v(x<y)  =    2   &j^j(x>y)  <        ^;(xi>)  =  hierarchical  basis  function  which  is  1  at  node  j, 
;»1 

just  as  it  can  be  represented  as  a  (different)  linear  combination  of  the  usual  nodal  basis  func- 
tions (piecewise  linear  functions  which  are  one  at  one  node  and  zero  at  all  other  nodes  of  the 
finest  grid), 


v(x>y)  =    ^laj<i>j(x<y)  >        4>j(x>y)  ~  nodal  basis  function  which  is  1  at  node  j. 


Solving  the  matrix  KT  means  translating  from  hierarchical  to  nodal  basis  coefficients. 
That  is,  if  a  vector  3  contains  the  coefficients  of  a  piecewise  linear  function  with  respect  to 
the  hierarchical  basis  functions  then  a  =  K~T$  is  the  vector  of  coefficients  of  the  same  piece- 
wise  linear  function  with  respect  to  the  usual  nodal  basis  functions.  To  solve  the  linear  sys- 
tem KTa  =  P,  one  starts  with  the  coarsest  grid  and  takes  the  component  of  a  at  that  point  to 
be  the  same  as  the  corresponding  component  of  3-  One  then  moves  to  the  next  finer  grid 
and  takes  the  component  of  a  at  each  new  node  to  be  the  component  of  3  at  that  node  plus 
the  average  of  the  components  of  a  at  each  of  the  surrounding  coarser  grid  nodes.  Continu- 
ing in  this  way  to  the  finest  grid,  one  obtains  the  vector  of  coefficients  of  the  nodal  basis 
functions.  It  follows  that  if  the  coarsest  grid  point  is  numbered  last,  preceded  by  those  on 
the  next  finer  grid,  etc.,  with  the  points  that  lie  only  on  the  finest  grid  numbered  first,  then 
KT  is  an  upper  triangular  matrix  with  nonzero  elements  on  its  main  diagonal  and  in  positions 
corresponding  to  the  couplings  between  nodes  on  one  grid  level  and  their  nearest  neighbors 
on  the  next  coarser  grid  level.  Note  that  the  couplings  are  not  between  nearest  neighbors  on 
the  finest  grid,  as  they  are  in  the  nodal  basis  matrix  A. 

One  might  ask  how  small  a  condition  number  can  be  obtained  with  a  preconditioner  of 
this  form.  Again,  since  it  is  the  sparsity  pattern  of  the  triangular  factors  that  is  fixed,  the 
optimization  code  cannot  necessarily  find  the  optimal  preconditioner  of  this  form.  It  can, 
however,  give  an  upper  bound  on  the  minimal  condition  number. 

When  the  triangular  factors  K  and  KT  are  multiplied  together,  they  produce  a  precondi- 
tioner M  which  can  have  nonzeros  only  in  certain  positions.  (Af,;  can  be  nonzero  only  if,  for 
some  k,  Kik  and  Kjk  are  both  nonzero.)  One  might  also  ask  how  small  a  condition  number 
can  be  obtained  with  a  preconditioner  M  having  this  sparsity  pattern.  Clearly,  this  gives  a 
lower  bound  for  the  minimal  condition  number  obtainable  from  triangular  factors  having  the 
sparsity  of  K  and  KT .  The  optimization  code  can  be  used  to  solve  this  problem,  and  since  the 
function  to  be  minimized  is  convex,  if  the  code  finds  a  minimum,  then  it  is  the  global 
minimum. 

In  Table  3,  the  condition  number  of  the  matrix  preconditioned  by  the  hierarchical  basis 
preconditioner  is  listed,  along  with  upper  and  lower  bounds  on  the  minimal  condition  number 
obtainable  from  a  preconditioner  of  the  form  KKT,  where  K  has  the  same  sparsity  pattern  as 
the  lower  triangular  factor  of  the  hierarchical  basis  preconditioner.  To  avoid  ambiguity  in 
defining  the  hierarchy  of  grids,  we  considered  only  grid  sizes  which  are  powers  of  2;  i.e., 
h=  1/4  and  h  =1/8.  For  the  A  =  1/8  problem,  when  the  sparsity  of  M  was  specified  (to  obtain  a 
lower  bound  on  the  minimal  condition  number  when  the  sparsity  of  K  is  specified),  the 
optimization  code  did  not  find  a  point  which  it  identified  as  optimal.  It  stopped  because  of  a 
too  small  trust  region  radius,  and  attempts  at  restarting  resulted  in  its  finding  approximately 
the  same  "solution",  but  again  failing  to  identify  it  as  optimal.  Hence,  this  entry  in  the  table 
is  considered  somewhat  uncertain.  Although  the  optimization  code  shows  that  the  hierarchi- 
cal basis  function  preconditioner  can  clearly  be  improved  upon  for  the  grid  sizes  shown,  this 
is  of  questionable  importance.  It  is  only  for  finer  grids  that  the  hierarchical  basis  function 
preconditioner  is  really  expected  to  be  efficient.    At  these  coarse  grid  sizes,  the  condition 
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number  of  the  preconditioned  matrix  is  growing  much  faster  than  0(log  h~1)2,  so  it  is  not 
clear  that  improvements  seen  at  these  coarser  grid  levels  will  also  be  possible  for  finer  grids. 
Unfortunately,  it  is  currently  impractical  to  use  the  optimization  code  for  finer  grid  sizes. 


K(z.-'jr^rL-r), 

K  from  optimization  code 

M  from  optimization  code 

A-l 

Hierarchical  Basis 
Cond.  No. 

Upper  Bound  on 
Minimal  Cond.  No. 

Lower  bound  on 
Minimal  Cond.  No. 

4 

4.56 

3.00 

3.00 

8 

10.59 

5.27 

4.86  ? 

Table  3.   Condition  Number  for  Hierarchical  Basis  Preconditioner   and   Upper  and  Lower 
Bounds  on  Condition  Number  for  the  Optimal  Preconditioner  of  the  Same  Form. 


4.    Experimental  Results  for  Other  Diffusion-Type  Operators. 

To  determine  if  other  diffusion-type  elliptic  operators  could  be  preconditioned  more  or 
less  effectively  than  the  Laplacian,  we  attempted  to  compute  optimal  preconditioners  of  some 
of  these  same  forms  for  several  other  operators.    The  problems  were  all  of  the  form 


VpVu=/,       on  (0,1)  x  (0,1), 


«(x,0)  =  «(x,l)  =  «(0,y)  =  «(l,y)  =  0, 


where  the  diffusion  coefficient  p  was  varied.   The  values  of  p  considered  were: 


p(x,y)  =   1.        (as  in  the  previous  section) 


p(x,y)  =   .01  +  x*  +  y2. 


(a.) 
(b.) 


(l        if  x  <  . 

>u»y;  -  hoo  ifx  &  . 


(c) 


p(*.y)  = 


1  if  x  <  .5  and  y  <  .5 

104  if  x  :>  .5  and  y  <  .5 

10~4  if  x  <  .5  and  y  s  .5 

5  if  x  &  .5  and  ya.5 


(d.) 


To  avoid  ambiguity  in  differencing  around  the  discontinuities  in  p,  a  continuous  piecewise 
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linear  finite  element  approximation  was  used  to  generate  the  matrix  A  corresponding  to  each 
of  these  operators. 


4.1.    Diagonal  Preconditioned  for  A. 

Although  a  highly-varying  diffusion  coefficient,  such  as  that  in  (d),  results  in  a  very 
badly  conditioned  finite  difference  or  finite  element  matrix,  it  has  long  been  known  that  a 
simple  diagonal  scaling  could  greatly  reduce  this  condition  number.  In  Fig.  6  are  plotted  the 
condition  numbers  of  the  matrices  for  these  four  problems,  after  they  have  been  scaled  by 
their  diagonals.  According  to  the  Forsythe&Strauss  theorem,  tnis  is  the  optimal  diagonal 
preconditioner  for  these  matrices.  Note  from  the  figure  that  after  diagonal  scaling,  the  con- 
dition numbers  for  all  of  these  matrices  are  nearly  the  same,  with  case  (b)  actually  being 
somewhat  better  conditioned  than  the  Laplacian  (a).  In  cases  (c)  and  (d),  this  condition 
number  jumps  slightly  when  the  mesh  size  Vh  is  not  even.  In  this  case,  the  finite  element 
equations  are  a  poor  approximation  to  the  differential  equation,  anyway,  because  the  discon- 
tinuity of  p,  and  hence  Vu,  occurs  within  a  mesh  cell. 

It  is  also  interesting  to  note  that,  although  the  optimal  diagonal  preconditioners  for 
these  problems  are  known,  the  optimization  code  had  a  great  deal  of  difficulty  in  finding  the 
solutions  for  cases  (c)  and  (d).  For  the  /i  =  1/4  grid,  starting  from  an  initial  guess  that  was 
equal  to  twice  the  true  solution,  the  code  required  272  iterations  to  find  the  optimal  diagonal 
preconditioner  for  case  (c),  compared  to  just  12  for  case  (b).  In  case  (d),  the  code  was 
stopped  after  600  iterations  when  only  negligible  improvement  had  been  made  over  the  initial 
guess.  The  reasons  for  this  difficulty  are  currently  being  investigated  to  determine  if  some 
rescaling  or  other  modification  of  the  problem  can  make  the  optimization  job  easier. 


4.2.    Tridiagonal  Preconditioners  for  A. 

Because  a  simple  diagonal  scaling  results  in  matrices  that  are  about  equally  well- 
conditioned,  one  might  expect  that  any  class  of  preconditioners  which  is  broad  enough  to 
contain  all  diagonal  matrices,  e.g.,  tridiagonal  preconditioners,  would  be  about  as  effective 
for  one  of  these  operators  as  for  another.  This  did  not  turn  out  to  be  the  case,  however. 
The  optimal  tridiagonal  preconditioner  in  cases  (a)  and  (b)  gave  a  significantly  smaller  condi- 
tion number  than  in  cases  (c)  and  (d).  The  optimization  code  again  had  difficulty  in  finding 
the  optimal  preconditioners  for  cases  (c)  and  (d),  but  after  many  iterations  it  finally  did  find 
solutions  which  it  identified  as  optimal.  These  condition  numbers  are  plotted  in  Fig.  7.  The 
point  h  =175  was  not  included  in  cases  (c)  and  (d),  since  the  difference  equation  is  not  a  good 
approximation  to  the  differential  equation  in  this  case  (since  the  discontinuity  of  p  occurs 
within  a  zone).  For  grids  containing  mesh  lines  along  the  discontinuities  of  p,  we  expect  that 
the  condition  number  for  all  of  these  optimally  preconditioned  operators  will  grow  as 
0(*"2). 


4.3.    Preconditioners  of  the  Form  DAD,  Where  A  is  the  Laplacian  and  D  is  Diagonal. 

It  was  recently  proved  that  the  matrix  A  from  an  arbitrary  second-order  self-adjoint 
elliptic  partial  differential  equation  can  be  preconditioned  by  the  matrix  A  corresponding  to 
the  Laplacian  on  the  same  grid  and  with  the  same  boundary  conditions  so  that  the  resulting 
preconditioned  matrix  has  condition  number  0(1),  independent  of  the  mesh  size  [17].  Since 
the  Laplacian  is  relatively  easy  to  solve  on  a  rectangular  grid  (and  with  an  integral  equation 
formulation,  perhaps  also  on  an  irregular  grid  [18]),  this  might  make  an  effective  precondi- 
tioner. Unfortunately,  however,  the  constant  in  the  condition  number  bound  can  be  quite 
large,  and  it  is  large  for  problems  (c)  and  (d).  One  might  ask  if  there  is  a  simple  modifica- 
tion of  the  Laplacian  that  would  still  be  easy  to  solve  and  would  give  a  condition  number  that 
is  not  only  0(1)  but  has  a  small  constant  as  well.    The  simplest  idea  is  to  scale  the  Laplacian 
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by  some  diagonal  matrix  D  and  use  the  symmetric  matrix  DAD  as  the  preconditioner.  This  is 
the  same  form  of  preconditioner  that  was  studied  by  Concus  and  Golub  [5],  who  determined 
the  optimal  such  preconditioner  for  1-D  problems. 

We  used  the  optimization  code  to  find  the  diagonal  matrix  D  for  which  the  spectral 
radius  of  /  -  L_1DADL_r  was  as  small  as  possible,  where  A=LLT,  was  the  matrix  arising 
from  problems  (b-d).  Again,  the  optimization  code  had  difficulty  with  problem  (d),  so  we 
omitted  this  from  our  results.  In  Fig.  8  are  plotted  the  condition  numbers  for  problems  (b) 
and  (c),  preconditioned  by  the  optimally  scaled  Laplacian-type  preconditioner,  DAD.  Com- 
putations for  case  (c)  were  performed  on  somewhat  finer  grids  than  for  case  (b),  as  will  be 
explained  below.  Although  the  condition  number  for  each  of  these  preconditioned  matrices 
is  bounded  by  a  constant  independent  of  h,  we  have  not  reached  a  small  enough  value  of  h  to 
determine  this  constant.  Clearly,  problem  (b)  is  very  well-conditioned  by  the  appropriate 
diagonal  scaling  of  the  Laplacian,  and  problem  (c)  is  also  reasonably  well  approximated  by 
such  a  preconditioner,  for  the  grid  sizes  shown. 

The  optimal  diagonal  matrix  determined  in  problem  (c)  was  especially  interesting.  If 
nodes  to  the  left  of  the  discontinuity  in  p  are  numbered  first,  then  nodes  to  the  right  of  the 
discontinuity,  and  finally  nodes  on  the  discontinuity,  then  the  optimal  diagonal  scaling  for  the 
Laplacian-type  preconditioner,  as  determined  by  the  optimization  code,  has  the  following 
form: 


D  = 


d,l 


d-,1 


dd 


where  d,,  i  =  1,2,3  are  constants  and  each  block  corresponds  to  one  of  the  above  mentioned 
subregions.  Using  this  ordering  of  nodes,  the  matrix  A  arising  from  problem  (c)  has  the 
form 


A  = 


ClKu       0       ClKl3 

0         ^2     22    ^2     23 
C\KL    c2Kh    c3*33 


where  the  blocks  K,j,  ij=  1,2,3  are  the  blocks  of  the  Laplacian,  and  the  constants  c,,  /'  =  1,2,3 
are 

c,  =  1,      c2  =  100,     c3  =  50.5  . 
The  preconditioner  DAD  returned  by  the  optimization  code  then  has  the  form 


M  = 


d}Kn  0         d,d,Kn 

0  d\Kii       ^2^3^23 

dxd%K\^    d2d3Kh     djK33 


(4.3.1) 


The  constants  d,,  i=  1,2,3,  are  listed  in  Table  4,  for  grid  sizes  1/A  =  4,6,8.  Seeing  this  pattern, 
we  were  able  to  go  to  finer  grid  sizes  by  restricting  the  preconditioner  to  be  of  the  form 
(4.3.1),  and  having  the  optimization  code  determine  only  the  best  constants  d,,  i=  1,2,3. 
These  are  also  listed  in  Table  4,  for  Vh  =  10,16,  and  the  corresponding  condition  numbers  are 
plotted  in  Fig.  8.    The  condition  number  of  the  preconditioned  matrix  still  has  not  reached  its 
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asymptotic  limit,  however,  and  we  do  not  know  what  this  limit  is.  It  is  interesting  to  note 
that  at  grid  size  1/>i  =  16,  d2  is  equal  to  d2.  The  difference  between  dx  and  d2  or  dz  is  also 
becoming  smaller  and  probably  approaches  some  asymptotic  limit. 


Vh 

d} 

di 

d, 

4 

1.97 

16.92 

13.52 

6 

1.94 

14.60 

12.77 

8 

1.91 

13.05 

12.08 

10 

1.89 

11.89 

11.49 

16 

1.88 

10.05 

10.05 

Table  4.  Constants  Defining  the  Optimal  Preconditioner  of  the  Form  DAD. 


5.    Conclusions. 

We  have  demonstrated  a  very  useful  tool  in  the  study  of  preconditioners.  Again,  an 
optimization  code  is  not  usually  a  practical  method  for  finding  a  good  preconditioner  for  a 
given  problem,  but  rather  it  is  intended  to  give  insight  into  the  properties  of  preconditioners 
and  the  forms  of  matrices  that  can  or  cannot  be  potentially  effective  preconditioners.  Results 
from  the  code  have  led  to  several  conjectures  and  may  also  contribute  to  the  ultimate  goal  of 
finding  an  easily  solved  preconditioner  that  gives  a  condition  number  that  is  0(1)  and  has  a 
small  constant  factor,  for  a  large  class  of  matrices  arising  from  elliptic  partial  differential 
equations. 
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Fig.  1.  Tridiagonal  Preconditioned  for  A 
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Fig.  2.  Approximations  to  A-inv  with  Same  Sparsity  as  A 
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Fig.  3.  Tridiagonal  Preconditioners  forC 
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Fig.  3b.  Tridiagonal  Preconditioners  forC 
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Fig.  4.  Toeplilz  Preconditioners  for  C 
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Fig.  4b.  Eigenvalues  of  Golub-Mayers  Prcconditioner  (h=  1/1 6) 
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Fig.  4c.  Eigenvalues  of  Optimal  Toeplitz  Preconditioner  (h=  1/1 6) 
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Fig.  5.  Preconditioners  of  the  Form  KKT;K  Has  Sparsity  of  Lower  Tri(A) 
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Fig.  6.  Diagonally  Preconditioned  Elliptic  Operators 
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Fig.  7.  TridiagonalJy  Preconditioned  Elliptic  Operators 
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Fig.  8.  Preconditioned  of  the  Form  DAD,  for  Elliptic  Operators 
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